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Versatile approach to access the low temperature thermodynamics of lattice polymers 

and proteins 
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We show that Wang-Landau sampling, combined with suitable Monte Carlo trial moves, provides 
a powerful method for both the ground state search and the determination of the density of states 
for the hydrophobic-polar (HP) protein model and the interacting self-avoiding walk (ISAW) model 
for homopolymers. We obtained accurate estimates of thermodynamic quantities for HP sequences 
with > 100 monomers and for ISAWs up to > 500 monomers. Our procedure possesses an intrinsic 
simplicity and overcomes the limitations inherent in more tailored approaches making it interesting 
for a broad range of protein and polymer models. 
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Coarse-grained polymer and protein models play an 
important role in understanding physical phenomena 
such as e. g. protein folding or the phase behavior of flex¬ 
ible macromolecules, and Monte Carlo simulation meth¬ 
ods have become an indispensable tool for the study of 
such models [l|. One of the most prominent examples is 
the hydrophobic-polar (HP) lattice model 0, where the 
protein is represented as a self-avoiding chain of beads 
(the amino acid residues) on a lattice. The amino acids 
are divided into two classes - hydrophobic (H) and polar 
(P) - and an attractive interaction e acts between non- 
bonded neighboring H residues mimicking the hydropho¬ 
bic force {€hh = —^^^hp,pp = 0). The special case of 
a chain consisting entirely of H residues (homopolymer), 
the interacting self-avoiding walk (ISAW), is an impor¬ 
tant model for studying the statistical physics of poly¬ 
mers [1, Q- 

Despite their formal simplicity and minimalistic frame¬ 
work, lattice models represent a challenging testing 
ground for computational methods because of their com¬ 
plex energy landscapes, conformational constraints and 
dense packings. The HP model has become a standard 
for assessing the efficiency of folding algorithms, and 
numerous - some very tailored - conformational ground 
state search strategies have been proposed, see e. g. 
and references therein. 

More revealing than algorithms that merely search for 
low energy states, however, are methods which target the 
sampling of the entire conformation and energy space. 
They can provide an estimate of the density of states 
(DOS) g{E) of energy E which, in turn, gives access 
to thermodynamic properties (e. g. internal energy, spe¬ 
cific heat, entropy or free energy) of a system at any 
temperature Only a few attempts have been under¬ 
taken to this end for the HP model, the most notable ap¬ 
proaches being multi-self overlap ensemble Monte Carlo 
(MSOE) [T^, multicanonical chain growth (MCCG) f\A\ . 


and equi-energy sampling (EES) [l2[. Although inven¬ 
tive and powerful, these methods also suffer from severe 
limitations: Large memory needs for keeping track of 
all sampled conformations (construction of microcanon- 
ical ensembles) (EES); (quasi-) statics, i. e. one bead of 
the chain is permanently fixed in space (MCCG); or the 
necessity to treat an expanded ensemble resulting in a 
large amount of computer time spent in sampling non¬ 
physical space (MSOE). Such restrictions can become in¬ 
creasingly important for more complex biological setups 
such as multi-chain systems or protein folding in hetero¬ 
geneous environments (e. g. membranes) [l^ . 

In this Letter we show that a generic algorithm - Wang- 
Landau sampling 0 - together with appropriate Monte 
Garlo trial moves, provides a powerful, yet flexible 
methodology for the simulation of HP-like lattice pro¬ 
teins and homopolymers that does not suffer from any of 
the above limitations. 

The key to our approach is the combination of two 
“non-traditional” Monte Garlo trial moves, which com- 
ffiement one another extremely well, namely pull moves 
Q and bond-rebridging moves [iBl, see Eig. [TJ Origi¬ 
nally proposed for the square and simple cubic lattices 
only, here we extended both types of trial moves to any 
n-dimensional space {n > 2). (i) Pull moves @ allow 

for the close-fitting motion of a polymer chain within 
a confining environment by “puiiing” portions of the 
poiymer to unoccupied neighboring sites. Puii moves 
are reversibie and fuifiii ergodicity; moreover, they pro¬ 
vide a good baiance between iocai and giobai conforma- 
tionai changes, as weii as a “naturai” dynamics of foiding. 
These features are important to an aigorithm that seeks 
to sampie the entire conformationai space such as Wang- 
Landau sampiing and thus requires an efficient move for 
the continuous foiding and unfoiding of the poiymer. (ii) 
Bond-rebridging moves [l5|: Triai moves which dispiace 
monomers become ineffective for very compact confor¬ 
mations where few unoccupied neighboring sites remain 
avaiiabie. In contrast, bond-rebridging moves aiiow the 
poiymer to change its conformation even at highest den- 
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FIG. 1: Typical example of pull (a) and bond-rebridging (b) 
move in 2D. For details, see [a. [l^. 


sities by reordering bonds while leaving the positions 
of monomers unchanged. Moreover, they facilitate long 
range topological changes, e. g. entanglement, which oth¬ 
erwise require costly unfolding/folding processes. This 
later feature becomes particularly important when the 
sampling of the DOS is split up into energy subintervals 
as it substantially reduces the risk of “locking-out” con¬ 
formational space [l6|. During sampling, pull or bond¬ 
rebridging trial moves were selected randomly (usually 
with a 1 : 1 ratio for each type). Pull moves enabled 
us to sample the entire conformational space of long 
polymer chains which was not feasible with “traditional 
moves” only Q. Furthermore, the combination of bond¬ 
rebridging and pull moves provided a speed-up of a factor 
3 for the HP model and a factor of 10 for the IS AW as 
comparing with pull moves only. 

Wang-Landau (WL) sampling is an efficient and robust 
algorithm for the computation of the DOS for diverse sta¬ 
tistical physical systems, see im for details. To fulfill 
detailed balance in conjunction with pull moves, in this 
study the WL transition probability from a state A to a 
state B has been generalized to 


P{A B) = min 


f ^ 9 {Ea) ^ 

V ' 9{Eb) riA^BlnAj 


( 1 ) 


ua^b denotes the number of pull moves from A to H and 
ua the total number of possible pull moves from A; ub^a 
and riB correspondingly (here, tia^b = ^b^a because 
of reversibility). Selecting only within the list of possible 
pull moves (tia) also increases the dynamics for dense 
conformations as compared to a standard “trial and er¬ 
ror” procedure. In order to yield accurate and reliable 
DOS estimates over the entire energy range (including 
the lowest energies) we used a very stringent parameter 
set for all our simulations, i. e. final modification factor 
ln(/finai) = 10“^ and flatness criterion p = 0.8; statistical 
errors were always calculated from 15 independent DOS 
estimates (by means of a Jackknife analysis). 

The knowledge of the exact energy range is essential in 
the WL algorithm for the examination of the flatness of 
the histogram. Often, however, energy boundaries are a 
priori unknown, (hence the use of ground state search al¬ 
gorithms, e. g. for the HP model). To solve this dilemma, 
the following procedure proved to be most efficient: Ev¬ 
ery time a new energy level Enew is found, it is marked 
as “visited” and ^(Enew) is set to i- e. the minimum 
of g among all previously visited energy levels. The flat¬ 
ness of the histogram is checked for visited energy levels 



FIG. 2: (Golor online) Specific heat Cv/N, mean radius of 
gyration (Rg)/N (A, chain length), and mean Jaccard index 
(q) as a function of temperature (T) for HP sequence 2D 100b 
(top) and 3D 103 {bottom), respectively. 


only. With this self-adaptive procedure, new regions of 
conformational space can be explored while, at the same 
time, the current DOS estimate is further refined. 

First, we applied our procedure to various benchmark 
HP sequences found in the literature. Since heteropoly¬ 
mers with A < 50 no longer represent a significant chal¬ 
lenge and our results are in perfect agreement with pre¬ 
vious works, we restrict our presentation to two longer 
sequences which turned out to be particularly demand¬ 
ing, namely, a lOOmer in 2D (2D100b) and a 103mer in 
3D (3D103); for definitions of HP sequences, see e. g. [sj. 
The ground states of sequence 2D 100b are believed to 
have an energy E = —50 however, previous 

attempts to obtain the DOS over the entire energ y ra nge 


[—50, 0] within a single simulation have failed [l0|,ll2|. In 
contrast, with our approach we were able to achieve this 
with high accuracy. Fig. [2](^op) shows the resulting spe¬ 
cific heat Cv{T)/N , depicting a peak at T ^ 0.48 (coil- 
globule transition) and a very weak shoulder at T ^ 0.23 
(folding transition). Such two-step acquisition of the na¬ 
tive state has been observed in studies of realistic pro¬ 
tein models and is not restricted to lattice models. For 
sequence 3D 103, the lowest energy found so far was —57, 
achieved only by fragment regrowth Monte Carlo via 
energy-guided sequential sampling (FRESS) 0. With 
our approach, we discovered an even lower state with en¬ 
ergy —58. Moreover, we were also able to obtain the DOS 
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in the energy range [—57,0], within a single simulation, 
and with very high accuracy. It was nonetheless not pos¬ 
sible to determine the relative magnitudes of the ground 
state {E = —58) and 1st excited state {E = —57) DOS 
with high precision. Fig. ^{bottom) displays the specific 
heat for sequence 3D 103, manifesting a peak at T ^ 0.51 
and a shoulder at T « 0.27. We do not observe an ad¬ 
ditional ^ak in Cv at very low temperatures, contrarily 
to Ref. Uj. However, since only conformations with en¬ 
ergies down to ^ = — 56 were found and the estimated 
errors near that peak were rather large, we think that 
this finding was an artefact of insufficient sampling. In¬ 
deed, our Cv curves indicate that the folding transitions 
from unstructured globular conformations to the ground 
states are rather weak for both sequences - despite the 
difficulty in sampling their low energy regimes. 

By means of multicanonical sampling given our DOS esti¬ 
mates, we obtained the radius of gyration Rg Q and the 
Jaccard index q = max+ Cg)\Eg = min} 
which measures the structural similarity between any 
conformation s and the ground states g of an HP se¬ 
quence g denotes the number of common (native) 

H-H contacts between s and and Cg, Cg are the numbers 
of H-H contacts found only in s and respectively (the 
maximum stems from the possible degeneracy of ground 
states). Fig. [2] also shows the averages {Rg) and (g) for 
sequences 2D 100b and 3D 103 and illustrates the comple¬ 
mentary information in these two quantities. While {Rg) 
indicates the coil-to-globule collapse, (g) identifies the 
folding transition to the native state and thus may serve 
as a suitable structural order parameter for these kind 
of systems. In case of sequence 3D 103, the ground state 
{E = —58) was excluded from the sampling (due to the 
difficulty in finding this state) which results in (g) satu¬ 
rating at a rather low value (< 0.3) for T 0. This 
manifests the still large structural differences between 
conformations with E = —57 and the ground state. 
TABLE [J compares various methods in finding low en¬ 
ergy conformations and, if available, the DOS for com¬ 
mon benchmark HP sequences. We also included results 
from methods which were focused on the low tempera¬ 
ture range only, i. e. PRESS Q and the variants of PERM 
(pruned-enriched Rosenbluth method) Q and hence do 
not provide the entire DOS. Except for the longest se¬ 
quence (3D136), we could confirm all minimum energy 
states found previously. The superiority of ERESS for 
this sequence is the result of various “efficiency enhance¬ 
ments” towards low energy states (see 0) which become 
obviously the more effective the longer the chain length. 
However, they do not permit anymore a correct sampling, 
let alone an estimation of the DOS. 

As a second test of performance, we applied our method 
to the interacting self-avoiding walk (ISAW) represent¬ 
ing a homopolymer with nearest-neighbor attraction (e = 
— 1) on the square (sq, 2D) and simple cubic (sc, 3D) lat¬ 
tice. Unraveling the “phase transition” behavior of flex- 


TABLE I: Energy minima found by several methods for 
benchmark HP sequences in 2D and 3D. The first column 
names the sequence (dimension and length), see [^. In case 
of Wang-Landau sampling (WLS), numbers in parentheses 
denote that the DOS has been obtained down to this energy. 
Horizontal lines mean no data available. 


Seq. 

WLS 

EES 

MCCG 

MSOE 

PRESS*’ 

PERM*’ 

2D100a 

-48 

-48 

- 

-47 

-48 

-48 

2D100b 

-50 

-49 

- 

-50" 

-50 

-50 

3D88 

-72 (-69) 

- 

- 

- 

-72 

-69 

3D103 

-58“ (-57) 

- 

-56 

- 

-57 

-55 

3D124 

-75 (-74) 

- 

- 

- 

-75 

-71 

3D136 

-82 (-81) 

- 

- 

- 

-83 

-80 


“See e. g. dr 2 U 2 ldbdrubdblfldrbl 2 f 2 dr 2 dbrulbr 2 drfrul 2 dfurufdl- 
dflb 2 ufuf 2 rb 2 urb 2 rbluldfu 2 fd 3 brblbul 2 brdf 4 lf 2 dbld (encoded as 
sequence left[l], right [r], up[u], down[d], forward [f], backward [b]). 
^Ground state search only (no DOS estimate) 

“DOS not attained. 


ible macromolecules in the thermodynamic limit {N 
oo) by means of simple (lattice) models - such as e. g. 
the ISAW, the bond-fluctuation model or systems in the 
continuum - has been a challenge for decades 0, fisj . 
Although the 0 point (coil-globule transition) could be 
investigated well for polymer chains with N > 10 000 
monomers, our understanding of the ISAW at very low 
temperatures remains elusive. Due to the very dense 
packings resulting for this model, accurate estimates of 
thermodynamic quantities below Tq are difficult to ob¬ 
tain. In the most recent computational studies, only 
chains with A" < 125 in 3D (multicanonical chain-growth 
0) and N < 300 in 2D (adaptive WL sampling with 
reptation, but without the lowest energy states [l6|) 
could be investigated. 

With our generic approach we were able to obtain ac¬ 
curate DOS estimates for ISAWs up to chain lengths 
N = 400 (2D) and A = 512 (3D) over the entire en¬ 
ergy range (including ground states) and we could then 
determine reliable thermodynamic quantities even at low¬ 
est temperatures (T ^0), see Eig. [3l The possibility to 
compare the specific heat Cy (T) for various system sizes 
up to these chain lengths allowed us to draw interest¬ 
ing conclusions which apply for the ISAW on both the 
sq and sc lattice: At high T, the collapse transition {0 
point) indicates a clear phase transition manifested by 
cooperative structural rearrangements from the coil to 
the globular state and Cv{Tq) ^ oc for A ^ oc. At 
very low T, a pronounced peak appears due to various 
ground state excitations (here, the ground states form 
either regular squares or cubes). These excitations are 
induced by loeal rearrangements at the surface and there¬ 
fore, the magnitude of the peak decreases systematically 
with chain length (2D) or becomes constant to within sta¬ 
tistical errors bars (3D). The breaking up of the ground 
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FIG. 3: (Color online) Specific heat Cv/N as a function of 
temperature (T) for ISAWs of various chain lengths N on 
square {top) and simple cubic (bottom) lattice. Numbers in 
parentheses denote corresponding energy minima. Bottom 
rows: Representative structures at specific temperatures for 
iV = 64 (2D) and N = 125 (3D). 


state structure bears similarity to surface roughening on 
crystal facets, i. e. the formation of kinks and edges at 
the surface of a compact core without vacancies (indeed, 
bulk vacancies appear at much higher T only). At in¬ 
termediate temperatures, metastable (and chain length 
dependent) phases emerge but they gradually diminish 
for ^ oc. Most notably, the IS AW on the sq/sc lat¬ 
tice does not undergo a true crystallization transition as 
observed for other lattice and off-lattice polymer mod¬ 
els SEi. Once in the globular phase, the rigidity of 
the model does not permit a further cooperative effect 
(i. e. symmetry breaking) which would be necessary for 
such a transition. Whereas a variation of chain length 
(N 7^ “magic” number) has an influence on the magni¬ 
tude and the position of the excitation peak at low T, 
the overall thermodynamic scenario remains the same 
for sufficiently large N. Note that it was essential to 
have data for chains that were longer than other meth¬ 
ods could treat in order to ascertain the low T behavior 
of the IS AW in the thermodynamic limit. 

In summary we have shown that Wang-Landau sampling 
with suitable Monte Carlo trial moves (pull and bond¬ 


rebridging moves combined) offers a powerful solution 
for studying the thermodynamics of lattice homo- and 
heteropolymers even in the very demanding low temper¬ 
ature ranges of such models. A major advantage of our 
method is that it remains rather simple and flexible be¬ 
side its proven performance which has not been achieved 
earlier, by more elaborate attempts [iMilii- These 
features make it readily applicable to the study of com¬ 
plex biological phenomena such as e. g. protein aggre¬ 
gation or protein insertion into a membrane El. Since 
both trial moves are usable for lattice and off-lattice mod¬ 
els [i^, other systems with conformational constraints 
should also benefit from our self-adaptive WL procedure. 
We thank K. Binder and W. Paul as well as C. Gervais 
and D. T. Seaton for fruitful discussions. This work was 
supported in part by NSF Grant DMR-0810223. 


[1] K. A. Dill et aL, Protein Sci. 4, 561 (1995); A. Kolinski 
and J. Skolnick, Polymer 45, 511 (2004). 

[2] K. A. Dill, Biochemistry 24, 1501 (1985); K. F. Fan and 
K. A. Dill, Macromolecules 22 , 3986 (1989). 

[3] A. D. Sokal, in Monte Carlo and Molecular Dynamics 
Simulations in Polymer Science^ edited by K. Binder 
(Oxford University Press, New York, 1995), p. 47. 

[4] K. Binder and W. Paul, Macromolecules 41, 4537 (2008). 

[5] H. Frauenkron et al., Phys. Rev. Lett. 80, 3149 (1998); 
H.-P. Hsu et a/., Phys. Rev. E 68, 021113 (2003). 

[6] N. Lesh et al, in RECOMB (2003), p. 188. 

[7] R. Backofen and S. Will, Gonstraints 11, 5 (2006). 

[8] J. Zhang et al., J. Ghem. Phys. 126, 225101 (2007). 

[9] D. P. Landau and K. Binder, A Cuide to Monte Carlo 
Simulations in Statistical Physics (Gambridge University 
Press, Gambridge, UK, 2005), 2nd ed. 

[10] Y. Iba et al, J. Phys. Soc. Jpn. 67, 3327 (1998); 
G. Ghikenji et al., Phys. Rev. Lett. 83, 1886 (1999). 

[11] M. Bachmann and W. Janke, Phys. Rev. Lett. 91, 208105 
(2003); J. Ghem. Phys. 120, 6779 (2004); T. Prellberg 
and J. Krawczyk, Phys. Rev. Lett. 92, 120602 (2004). 

[12] S. G. Kou et al, J. Ghem. Phys. 124, 244903 (2006). 

[13] See e. g. P. M. Harrison et al., J. Mol. Biol. 286, 593 
(1999); R. Bonaccini and F. Seno, Phys. Rev. E 60, 7290 
(1999); L. Zhang et al., Biophys. Ghem. 133, 71 (2008). 

[14] F. Wang and D. P. Landau, Phys. Rev. Lett. 86, 2050 
(2001); Phys. Rev. E 64, 056101 (2001). 

[15] J. M. Deutsch, J. Ghem. Phys. 106, 8849 (1997). 

[16] A. G. Gunha-Netto et al, Phys. Rev. E 78, 055701 (R) 
(2008). 

[17] R. Fraser and J. 1. Glasgow, in ICANNCA (1) (2007), 
p. 758. 

[18] D. F. Parsons and D. R. M. Williams, J. Ghem. Phys. 
124, 221103 (2006); W. Paul et al, Phys. Rev. E 75, 
060801(R) (2007); D. Seaton et al, Gomput. Phys. Gom- 
mun. 180, 587 (2009). 

[19] T. Vogel et al, Phys. Rev. E 76, 061803 (2007). 

[20] P. V. K. Pant and D. N. Theodorou, Macromolecules 28, 
7224 (1995). 








